SUGI 28: Case Studies in Time Series
نویسنده
چکیده
This paper reviews some basic time series concepts then demonstrates how the basic techniques can be extended and applied to some interesting data sets. INTRODUCTION Data taken over time often exhibit autocorrelation, This is a phenomenon in which positive deviations from a mean are followed by positive and negative by negative or, in the case of negative autocorrelation, positive deviations tend to be followed by negative ones more often than would happen with independent data. While the analysis of autocorrelated data may not be included in every statistics training program, it is certainly becoming more popular and with the development of software to implement the models, we are likely to see increasing need to understand how to model and forecast such data. A classic set of models known as ARIMA models can be easily fit to data using the SAS procedure PROC ARIMA. In this kind of model, the observations, in deviations from an overall mean, are expressed in terms of an uncorrelated random sequence called white noise. This paper presents an overview of and introduction to some of the standard time series modeling and forecasting techniques as implemented in SAS with PROC ARIMA and PROC AUTOREG, among others. Examples are presented to illustrate the concepts. In addition to a few initial ARIMA examples, more sophisticated modeling tools will be addressed. Included will be regression models with time series errors, intervention models, and a discussion of nonstationarity. WHITE NOISE The fundamental building block of time series models is a white noise series e(t). This symbol e(t) represents an unanticipated incoming “shock” to the system. The assumption is that the e(t) sequence is an uncorrelated sequence of random variables with constant variance. A simple and yet often reasonable model for observed data is ) t ( e ) ) t ( Y ( ) t ( Y + − − = − μ ρ μ 1 where it is assumed that ρ is less than 1 in magnitude. In this model an observation at time t, Y(t), deviates from the mean μ in a way that relates to the previous, time t-1, deviation and the incoming white noise shock e(t). As a result of this relationship, the correlation between Y(t) and Y(t-j) is ρ raised to the power |j|, that is, the correlation is an exponentially decaying function of the lag j. The goal of time series modeling is to capture, with the model parameter estimates, the correlation structure. In the model above, known as an autoregressive order 1 model, the current Y is related to its immediate predecessor in a way reminiscent of a regression model. In fact one way to model this kind of data is to simply regress Y(t) on Y(t-1). This is a type of “conditional least squares” estimation. Once this is done, the residuals from the regression should mimic the behavior of the true errors e(t). The residuals should appear to be an uncorrelated sequence, that is, white noise. The term “lag” is used often in time series analysis. To understand that term, consider a column of numbers starting with Y(2) and ending with Y(n) where n is the number of observations available. A corresponding column beginning with Y(1) and ending with Y(n-1) would constitute the “lag 1” values of that first column. Similarly lag 2, 3, 4 columns could be created. To see if a column of residuals, r(t), is a white noise sequence, one might compute the correlations between r(t) and various lag values r(t-j) for j=1,2,...,k. If there is no true autocorrelation, these k estimated autocorrelations will be approximately normal with mean 0 and variance 1/n., Taking n times the sum of their squares will produce a statistic Q having a Chi-square distribution in large samples. A slight modification of this formula is used in PROC ARIMA as a test for white noise. Initially, the test is performed on residuals that are just deviations from the sample mean. If the white noise null hypothesis is rejected, the analyst goes on to model the series and for each model, another Q is calculated on the model residuals. A good model should produce white noise residuals so Q tests the null hypothesis that the model currently under consideration is adequate. AUTOREGRESSIVE MODELS The model presented above is termed “autoregressive” as it appears to be a regression of Y(t) on its own past values. It is of order 1 since only 1 previous Y is used to model Y(t). If additional lags are required, it would be called autoregressive of order p where p is the number of lags in the model. An autoregressive model of order 2, AR(2) would be written, for example, as ) t ( e ) ) t ( Y ( ) ) t ( Y ( ) t ( Y + − − + − − = − μ α μ α μ 2 1 2 1 or as ) t ( e ) ) t ( Y )( B B ( = − − − μ α α 2 2 1 1 where B represents a “backshift operator” that shifts the time index back by 1 (from Y(t) to Y(t-1) ) and thus B squared or B times B would shift t back to t-2. Recall that with the order 1 autoregressive model, there was a single coefficient, ρ , and yet an infinite number of nonzero autocorrelations, that is, j ρ is not 0 for any j. For higher order autoregressive models, again there are a finite number, 2 in the example immediately above, of coefficients and yet an infinite number of nonzero autocorrelations. Furthermore the relationship between the autocorrelations and the coefficients is not at all as simple as the exponential decay that we saw for the AR(1) model. A plot of the lag j autocorrelation against the lag number j is called the autocorrelation function or ACF. Clearly, inspection of the ACF will not show how many coefficients are required to adequately model the data. A function that will identify the number of lags in a pure autoregression is the partial autocorrelation or PACF. Imagine regressing Y(t) on Y(t-1),...,Y(t-k) and recording the lag k coefficient. Call this coefficient ) k ( π . In ARIMA modeling in general, and PROC ARIMA in particular, the “regression” is done using correlations between the various lags of Y. In particular, where the matrix X’X would usually appear in the regression normal equations, substitute a matrix whose ij entry is the autocorrelation at lag |i-j| and for the usual X’Y, substitute a vector with jth entry equal to the lag j autocorrelation. Once the lag number k has passed the number of needed lags p in the model, you would expect ( ) k π to be 0. A standard error of 1/ n is appropriate in large samples for SUGI 28 Statistics and Data Analysis
منابع مشابه
197-31: Biosurveillance and Outbreak Detection Using the ARIMA and LOGISTIC Procedures
The main objective of this paper is to show potential usefulness of the combination of autoregressive integrated moving average (ARIMA) models and logistic regression with automatic model selection (see our work presented at SUGI’28 and SUGI’29.) Timeseries analysis with ARIMA provides only one perspective of the information in the surveillance data (i.e. the number of patients as a function of...
متن کاملAssessment of Trend and Seasonality in Road Accident Data: An Iranian Case Study
Background Road traffic accidents and their related deaths have become a major concern, particularly in developing countries. Iran has adopted a series of policies and interventions to control the high number of accidents occurring over the past few years. In this study we used a time series model to understand the trend of accidents, and ascertain the viability of applying ARIMA models on data...
متن کاملAnother Look at Low-Order Autoregressive Models in Early Detection of Epidemic Outbreaks and Explosive Behaviors in Economic and Financial Time Series
In our SUGI 2006 presentation, we suggested using low-order autoregressive models, AR(1) and AR(2), in biosurveillance and outbreak detection (PROC ARIMA, SAS/ETS). Our suggestion was based on empirical data. In the NESUG 2007 paper, we proposed strong theoretical grounds for this. Here we provide further development of our approach. Based on a classic susceptibleinfectious-recovered (SIR) mode...
متن کاملA hierarchical Bayes approach to reconstruction and prediction of nonlinear dynamical systems
An attempt is made to solve two classes of nonlinear time series prediction problems with a hierarchical Bayes Approach using neural nets.
متن کاملPrediction of the Type and Amount of Surface Water Pollutants using Time Series Models (ARIMA) and L-THIA Model (Case Study: Namrood Sub-Basin, Hablehrood Watershed)
Due to the important role of non-point source pollution in water resources management, in this study time series modeling was applied to forecast water quality parameters and L-THIA model (one type of non-point source pollution models) was applied to estimate water pollutants. The purpose of this study was to compare results of L-THIA model and ARIMA models in Namrood sub-basin located in ...
متن کاملطرحهای مطالعاتی در اپیدمیولوژی آلودگی هوا
London fog obviously revealed the role of air pollution in increasing mortality and morbidity rates associated with this phenomenon but several studies in the 70’s and 80’s couldn’t establish sufficient link between air pollution and death. In the 1990’s, time series studies conducted in a different locations, showed that air pollution levels, even at lower concentrations, were associated with ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002